Interpreting BLEU/NIST Scores: How Much Improvement do We Need to Have a Better System?
نویسندگان
چکیده
Automatic evaluation metrics for Machine Translation (MT) systems, such as BLEU and the related NIST metric, are becoming increasingly important in MT. Yet, their behaviors are not fully understood. In this paper, we analyze some flaws in the BLEU/NIST metrics. With a better understanding of these problems, we can better interpret the reported BLEU/NIST scores. In addition, this paper reports a novel method of calculating the confidence intervals for BLEU/NIST scores using bootstrapping. With this method, we can determine whether two MT systems are significantly different from each other.
منابع مشابه
Rule-Based Translation of Spanish Verb-Noun Combinations into Basque
This paper presents a method to improve the translation of Verb-Noun Combinations (VNCs) in a rule-based Machine Translation (MT) system for SpanishBasque. Linguistic information about a set of VNCs is gathered from the public database Konbitzul, and it is integrated into the MT system, leading to an improvement in BLEU, NIST and TER scores, as well as the results being significantly better acc...
متن کاملExperiments on Language Normalization for Spanish to English Machine Translation
We describe a trained system we have built, using freely available statistical packages and present experiments specifically designed to improve the performance of the system in the Spanish into English task. We performed both qualitative and quantitative evaluations that show better perceived translation quality as well as better BLEU and NIST scores.
متن کاملA Look inside the ITC-irst SMT System
This paper presents a look inside the ITC-irst largevocabulary SMT system developed for the NIST 2005 Chinese-to-English evaluation campaign. Experiments on official NIST test sets provide a thorough overview of the performance of the system, supplying information on how single components contribute to the global performance. The presented system exhibits performance comparable to that of the b...
متن کاملMachine Translation on the Medical Domain: The Role of BLEU/NIST and METEOR in a Controlled Vocabulary Setting
The main objective of our project is to extract clinical information from thoracic radiology reports in Portuguese using Machine Translation (MT) and cross language information retrieval techniques. To accomplish this task we need to evaluate the involved machine translation system. Since human MT evaluation is costly and time consuming we opted to use automated methods. We propose an evaluatio...
متن کاملVoting on N-grams for Machine Translation System Combination
System combination exploits differences between machine translation systems to form a combined translation from several system outputs. Core to this process are features that reward n-gram matches between a candidate combination and each system output. Systems differ in performance at the n-gram level despite similar overall scores. We therefore advocate a new feature formulation: for each syst...
متن کامل